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Dengue is a dangerous disease that can lead to death if the diagnosis and 
treatment are inappropriate. The common symptoms that occur, including 
headache, muscle aches, fever, and rash. Dengue is a disease that causes 
endemics in several countries in South Asia and Southeast Asia. There are 
three varieties of dengue, such as dengue fever (DF), dengue hemorrhagic 


fever (DHF), and dengue shock syndrome (DSS). This disease can currently 
be classified using a machine learning approach with the input data being the 
Keywords: dengue symptoms. This study aims to classify dengue types consisting of 
three classes: DF, DHF, and DSS using five classification methods including 


anne nce C.45, decision tree (DT), k-nearest neighbor (KNN), random forest (RF), 
poss vane enon and support vector machine (SVM). The dataset used consists of 21 
Dengue attributes, which are the dengue symptoms. It was collected from 110 
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patients. The evaluation method was conducted using cross-validation with 
k-folds of 3, 5, and 10. The dengue classification method was evaluated 
using three parameters: precision, recall, and accuracy, which were most 
optimally achieved. The most optimal evaluation results were obtained using 
SVM with k-fold 3 and 10 with precision, recall, and accuracy values 
reaching 99.1%, 99.1%, and 99.1%, respectively. 
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1. INTRODUCTION 

Nowadays, computer technology has been applied in various fields, including the medical field, in 
expert systems [1]. Over the last few years, expert systems have been developed. The expert system is 
constantly evolving because it can be integrated into clinical decision-making to predict disease and assist 
physicians in diagnosis. This system is a computer program that contains knowledge from one or more 
human experts related to a particular disease. 

Expert systems help patients find out the diagnosis results more efficiently based on the symptoms 
that occur and are felt. Moreover, it can be used at any time to become more economical. Therefore, this 
system contributes to disseminating expert knowledge to wider users. Expert systems can provide a more 
accessible and helpful way for human experts to develop and test new theories, especially in healthcare. The 
data used in the expert system can vary, such as images [2]—[4], signals [5], or medical record data which 
includes name, age, laboratory test results, and symptoms of the patient [6]—[8]. In the medical field, several 
expert systems have been developed, for example for estimating drug doses [6], [9], monitoring the disease 
progress [1], [2], [10], and detecting several types of diseases such as diabetes mellitus [11], pancreatic 
cancer [12], breast cancer [13], glaucoma [14], and dengue fever (DF) [15]-[17]. 
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Dengue fever is an arboviral disease caused by infection with one of the four dengue virus 
serotypes. It is spread through contact with the dengue (DENV) virus. According to the World Health 
Organization (WHO), this disease is estimated to have a global burden of 50 million illnesses annually, and 
about 2.5 billion people worldwide live in dengue-endemic areas [18]. A person can develop dengue fever 
with various symptoms, such as headache, muscle aches, fever, and a measles-like rash, also known as 
fracture fever [16]. Regarding statistical data in varied countries, several dengue-endemic cases were 
reported in Saudi Arabia, especially in the western and southern provinces of the Jeddah and Mecca areas, 
the first in 2011, when 2,569 cases were reported, and the second, in 2013 when 4,411 cases including 8 
deaths were reported. Dengue has also occurred in other areas of Saudi Arabia, including Medina (2009) and 
Aseer and Jizan (2013) [18]. Meanwhile, the Malaysian Ministry of Health reports that dengue fever has 
grown rapidly since 2012. In 2015, the Malaysian Ministry of Health published a report recording 107,079 
cases of dengue fever with 293 deaths, while there were 43,000 cases of dengue fever with 92 deaths in 2013 
[19]. The rapid spread of the dengue virus has become more and more dangerous, and addressing this issue 
should be considered an urgent case. Additionally, the national incidence of dengue hemorrhagic fever 
(DHF) in Indonesia increased from 50.8 per 100,000 population in 2015 to 78.9 per 100,000 population in 
2016 [20]. The clinical diagnosis can range from symptomatic dengue fever (DF) to a more severe form 
known as DHF, and the most fatal is dengue shock syndrome (DSS) [18]. 

Classification of dengue varieties has been carried out using a computer-based system. The input 
data are symptoms suffered by patients such as fever, headache, pain behind the eyeball, joint pain, muscle 
pain, and other symptoms. In addition, the thrombocyte and hemoglobin values of the patient are also 
indicative of dengue disease. The classification system is needed to immediately find out the symptoms 
suffered by the patient without convening the expert or doctor. It can be applied using several methods, such 
as rule-based [21]—[24], or using machine learning [25]. The following methods were used in prior studies to 
implement the machine learning-based classification process: naive Bayes [6], logistic regression [12], 
random forest (RF) [8], [12], [16], k-nearest neighbor (KNN) [26], artificial neural network (ANN) [10], 
[13], dan support vector machine (SVM) [8], [14]. 

The study based on machine learning was developed using several approaches, including KNN, 
linear SVM, naive Bayes, J48, adaboost, bagging, and stacking, to classify autism spectrum disorders in 
adults. The best results were obtained utilizing the bagging, linear SVM, and naive Bayes methods with an 
accuracy of 100% based on the test results using cross-validation with k-fold 3, 5, and 10 [6]. Classification 
of hepatitis disease applied based on SVM including linear SVM, polynomial SVM, gaussian radial basis 
function (RBF) SVM, and RF with a comparison of training and test data of 90% and 10%. The proposed 
SVM and RF methods succeeded in predicting the data correctly. They managed to achieve the best results 
with a value of 0.995 [8]. Comparison of SVM kernel selection implemented on diabetes dataset using linear 
SVM, polynomial SVM, and RBF kernel. The results obtained using the linear SVM kernel get the best 
results with an accuracy of 77.34%, while the RBF kernel obtains the lowest results with an accuracy of 
65.10% [27]. 

Subsequently, the extraction of the contour cup on the retinal fundus image was carried out to detect 
the patient of glaucoma by applying the multi-layer perceptron (MLP), KNN, naive Bayes, and SVM 
methods. The SVM method achieved the best accuracy results with a value of 94.44%, while the lowest 
results were performed by the MLP method with a value of 72.22% [14]. Improving the quality of 
mammogram images based on the region of interest is needed to obtain optimal breast cancer classification 
results using the hybrid optimum feature selection (HOFS) and ANN as the classifier feature selection 
method. The use of feature selection is able to reduce the number of features and improve the classification 
results with fewer features based on the values of accuracy, sensitivity, and specificity of 99.7%, 99.5%, and 
100%, respectively [13]. Antibiotic resistance detection based on machine learning was classified into two 
classes: resistant and sensitive. With the area under the curve-weighted metrics of 0.822 and 0.850, 
respectively, the stack ensemble technique produced the best results in the original and balanced datasets. 
Sex, age, sample type, Gram stain, 44 antimicrobial substances, and antibiotic susceptibility values were all 
included as the dataset attributes [7]. 

This study aims to classify dengue disease varieties divided into DF, DHF, and DSS. The input data 
used are symptoms caused by the disease. Classification is done by applying several machine learning 
methods consisting of C.45, DT, KNN, RF, and SVM, where the evaluation is carried out using 
cross-validation. The following sections structure the paper: section 2 describes the dataset and methods 
used, section 3 presents the result and discussion for each classification method based on the performance 
evaluation, and section 4 concludes the paper. 
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2. MATERIALS AND METHODS 

This section describes the dataset details used and the classification method. It also provides 
information on the process for evaluating the performance of each classification method. In this study, the 
dataset provided by Dirgahayu Hospital, Samarinda, Indonesia, consisted of 110 cases of dengue patients. 
The dataset is divided into three classes, including DF, DHF, and DSS, comprising 40 data, 61 data, and 9 
data, respectively. The data was collected in the form of patient code (Pcode), age, and symptoms 
experienced, including thrombocyte and hemoglobin values from each patient and the results of the diagnosis 
obtained from the expert. The symptoms experienced by each patient may vary, so the results of the 
diagnosis of dengue type from the expert vary. The example of several data collected from the patients is 
shown in Table 1. 

This study consists of two stages: training and testing. Both of them have two main processes are 
pre-processing and classification. Additionally, there is an evaluation method process required to measure 
each classifier’s performance. The input of the evaluation method is the diagnosis resulting from the expert 
(actual class) and classification method (predicted class). The overview of the dengue classification method 
is depicted in Figure 1. 


Table 1. The examples of several dengue patient data 
Pcode Age Gender Symptoms Diagnose 
Fever, headache, pain behind the eyes, skin rash, cough, vomitting, sore throat. 
KPOOL = 7 = Male Thrombocyte: 95.000, Hemoglobin: 32 re 
Fever, pain behind the eyes, joint pain, skin rash, red eyes, vomitting, cough. 
Thrombocyte: 97.000, Hemoglobin: 39 
Fever, headache, pain behind the eyes, muscle aches, skin rash, petechiae, bleeding 
KPO003 17 Female = manifestasions, vomitting, diarrhea, abdominal pain, red eyes, jaw pain. DHF 
Thrombocyte: 43.000, Hemoglobin: 49 
Fever, headache, pain behind the eyes, muscle aches, petechiae, bleeding 
KP004 10 Female manifestasions, shock, anxious, vomitting, diarrhea, abdominal pain, red eyes. DSS 
Thrombocyte: 4.000, Hemoglobin: 20 
Fever, headache, pain behind the eyes, joint pain, petechiae, bleeding manifestasions, 
KPOOS 16 Female —_ vomitting, diarrhea, abdominal pain, red eyes, jaw pain. DHF 
Thrombocyte: 102.000, Hemoglobin: 41 
Fever, headache, pain behind the eyes, joint pain, skin rash, vomitting, cough, red eyes. 


KP002 8 Male DF 


KPO06 = 3 Male Thrombocyte: 27.000, Hemoglobin: 32 Boe 
Fever, headache, pain behind the eyes, joint pain, nausea, red eyes. DHF 

KPI06 = 9 Female Thrombocyte: 166.000, Hemoglobin: 38 

KP107 9 Male Fever, headache, pain behind the eyes, joint pain, skin rash, vomitting, red eyes DF 


Thrombocyte: 120.000, Hemoglobin: 32 
Fever, headache, muscle aches, skin rash, petechiae, bleeding manifestasions, 

KP108 47 Female — vomitting, diarrhea, abdominal pain, jaw pain, inflammation. DHF 
Thrombocyte: 63.000, Hemoglobin: 47 
Fever, headache, joint pain, muscle aches, skin rash, petechiae, bleeding 

KP109 9 Male manifestasions, shock, anxious, vomitting, sore throat, cough. DSS 
Thrombocyte: 77.000, Hemoglobin: 34 
Fever, headache, joint pain, muscle aches, skin rash, petechiae, bleeding 

KP110 20 Female manifestasions, shock, anxious, vomitting, diarrhea, abdominal pain, cough, sore throat DSS 
Thrombocyte: 32.000, Hemoglobin: 30 


Training 
Classification 
aE (C.45, 
Diagnostic results Pre- Decision tree, 
processing KNN, 
Testing Random 


forest, SVM) 
—==> 
Predicted class 


Figure 1. The overview of the processes on the dengue classification method 
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2.1. Pre-processing 

Discretization was used to accomplish pre-processing. The data in Table | had to be converted into 
numerical data to be utilized as input into the classification process. The patient’s data included 18 different 
kinds of dengue disease symptoms (S), including fever (S1), headache (S2), joint pain (S3), muscle soreness 
(S4), maculopapular skin rash (S5), and petechiae (S6) (S4). S6), bruising (S7), shock (S8), anxiety (S9), 
vomiting (S10), constipation (S11), diarrhea (S12), heartburn (S13), red eyes (S14), lower jaw discomfort 
(S15), cough (S16), sore throat (S17), and nasal cavity inflammation (S18) and thrombocyte (T) and 
hemoglobin (H) values. Therefore, there are 20 attributes that become input for the following process, 
namely classification. The symptom data obtained from the patients in Table 1 are not numerical type so that 
in this process, each symptom experienced by the patient is given a value of 1. In contrast, if the patient does 
not experience these symptoms, it is given a value of 0. Meanwhile, the data on platelets and hemoglobin do 
not need to be pre-processed. Based on the pre-processing, the data obtained are ready to be used in the 
classification process. The pre-processing data in this study are shown in Table 2. 


Table 2. The resulting of pre-processing 


Patient Sl  S2 S3  S4 S5 S6 S7_... S13. S14 S15 S16 S117 _ S18 T H Diagnose 
Pl 1 1 1 0 0 1 0 fa 0 0 0 0 1 1 9500 32 DF 
P2 1 0 1 1 0 1 0 0 0 1 0 1 0 97000 39 DF 
P3 1 1 1 0 1 1 1 1 1 1 1 0) 0 43000 49 DHF 
P4 1 1 1 1 1 1 1 0 0 0 0 1 1 4000 20 DHF 
P5 1 1 1 0 1 1 1 0 0 1 0 0 0 102000 41 DSS 
P6 1 1 1 1 0 0 0 1 1 1 1 0 0 27000 32 DHF 

P106 1 0 1 0 0 1 1 0 0 1 0 1 0 166000 38 DHF 
P107 1 0 0 1 1 1 0 1 1 0 0 1 0 120000 32 DF 

P108 1 1 1 1 1 1 0 0 0 1 0 0 0 63000 = 47 DHF 
P109 1 1 1 1 0 1 1 0 0 1 0 1 0 771000 8934 DSS 
P110 1 0 1 0 1 1 0 1 1 0 0 1 0 32000 30 DSS 


2.2. Classification 

The classification process is carried out using a machine learning approach. Machine learning is an 
artificial intelligence (AI) area that contains techniques that allow computers to learn from empirical data, 
such as sensor data databases [7]. There are five classification methods implemented in this study, consisting 
of C.45, DT, KNN, RF, and SVM. Those classification method has been successfully implemented in several 
previous studies [6], [8], [16]. Classification is done using a cross-validation technique with k-fold 3, 5, and 
10 to distribute training and testing data [6]. An explanation regarding each classification method used is 
explained in the following sub-section. 


2.2.1. K-nearest neighbor (KNN) 

KNN is a supervised machine learning algorithm that can address classification and regression 
issues [24]. The vast majority of neighbors are considered input data. The KNN method must be performed 
numerous times with various K values in order to find the K that minimizes errors while maintaining 
prediction accuracy. With n number of data, a brute force search technique is implemented using the 
Euclidean distance function for the nearest neighbor search as in (1), where xi and yi are the testing and 
training data to i, respectively. 


d= (SEG -—y0? (1) 


2.2.2. Random forest (RF) 

RF is a file-producing machine learning method that is flexible and simple to use. Even without 
hyperparameter tweaking, a superb outcome will produce most of the time [8]. The RF is one of the most 
extensively used algorithms due to its simplicity and diversity. A RF is a group of trees that combines each 
decision tree (DT) based on a set of random variables. The DT is a vectorized flowchart [12]. For dimension 
n, the predictor variables are represented by the random vector x=(x1, x2, ..., xp) T, while a random variable 
y represents the real value response. Figure 2 is an illustration of the structure of a RF. 

The majority of RF parameters are based on two data objects. One to a third of the instances is not 
counted in the sample used to obtain unbiased data, notably from the batch or OOB, which estimates the 
classification error and the significance of the variable when substituting the sample for the current tree when 
producing the training set. The tree then processes all data for each case pair, and the closeness is calculated. 
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While the distance between two enclosures grows by one, they occupy the same end node. At the end of the 
run, the number of split trees is used to conduct proximity normalization. The detection of outliers is based 
on proximity, data substitution, and highlight. Outlier detection uses proximity as well as missing data 
replacement and highlighting to obtain low dimensional representations of data [8]. 


Dataset 
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T “T 2 Result N 
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Final result 


Figure 2. The illustration of the structure of a RF 


2.2.3. C.45 decision tree (DT) 

Most studies employ the C.45 decision tree, which is a comprehensive method of machine learning. 
C45 is typically used to generate a classification tree based on a hierarchical tree system, with attributes and 
leaf nodes illustrating the solution findings. The C45 approach’s visual categorization is successful and 
efficient. C45, on the other hand, is prone to data noise. Regression tree (CART), automatic chi-square 
interaction detector (CHAID), ID3, and C4.5 are some DT techniques employed. As a result, C.45 was used 
as one of the ways to improve classification accuracy in this investigation [6]. 


2.2.4. Decision tree (DT) 

A DT is a hierarchical structure that resembles a block diagram and comprises three essential 
elements: decision nodes that correspond to attributes, edges, or branches that correspond to multiple possible 
attribute values [28]. The leaf component is the third component, and it contains items that are usually of the 
same type or are quite similar. This view enables us to define decision rules for classifying new instances. In 
reality, each path from the root to a leaf corresponds to a conjunction of the test qualities, and the tree is 
thought of as a substitute for these conjunctions. The building (induction) and classification (inference) 
processes form the majority of DTs [28]: 

i) Build procedure: to set the training data. A DT is typically formed for a given training set by starting 
with an empty tree and using the attribute selection measure to select a "suitable" test attribute for each 
decision node. The rule is to pick an attribute that reduces class confusion between each test-generated 
training subset, making it easier to define the object's classes. The process is repeated for each 
sub-decision tree until the desired foliage is attained and the grades are approved. 

ii) Classification procedure: to classify a new instance that only has the values of all of its attributes. It is 
conducted by starting at the root of the built tree, taking the path that corresponds to the observed 
attribute value in the inner tree node. This technique is repeated until the leaf is found. Finally, we use a 
bound label to determine a specific instance's anticipated class value. 


2.2.5. Support vector machine (SVM) 

SVM is a supervised learning method for classification and regression. The primary purpose of 
SVM is to classify results by mapping data between input vectors and a large viewpoint space. As a result, 
linear SVM seeks to fully use the distance between the decision hyperplane and the marginal distance, the 
closest data point [27]. This study used linear SVM defined in (2), where {xi and xj} is the dataset. 


K(x;,x;) = [xi]? x; (2) 
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2.3. Performance evaluation 

Performance evaluation was conducted against the classification method using three measures 
parameters: precision, recall, and accuracy [12] based on the confusion matrix multiclass. The value of 
evaluation parameters is in the range of 0 to 100. The method indicates high performance if those parameters 
are close to 100. Those parameters are defined [29]: 


precision = hea Net x 100, (3) 

recall = sat x 100, (4) 

accuracy = Bier Nit x 100. (5) 
dint Dja1 Nij 


A confusion matrix is a machine learning concept that stores information about a classification 
system’s actual and expected classifications. A confusion matrix contains two dimensions: indexed by the 
actual item class and the classifier predicts class. The fundamental structure of a confusion matrix for 
multi-class classification problems is shown in Figure 3, with the classes A;, A2, and A,. The number of 
samples belonging to class A; but identified as class A; [29] in the confusion matrix is represented by Nj. 
Figure 3 shows a multi-class confusion matrix with n number of classes. 

The dengue patient dataset is divided into k subsets for the type of diagnosis data. In general, the 
data (k-1)/k is used for training, and the data 1/k is used for the testing. Then, the process is reiterated 
k-times. The validation result of mean k-time is selected as the last rate estimation as a final point. In this 
study, the performance is measured using cross-validation with the k-fold value of 3, 5, and 10. 


Actual 
Ay . Aj . AS 
Ai Na Ny Nin 
=) 
3 Aj Niq 5oaccr Ni rae Nin 
a}: 
An Nn1 Nnj Non 


Figure 3. Confusion matrix multi-class 


3. RESULTS AND DISCUSSION 

Pre-processing was done by discretization, which aims to convert all patient data into a numeric 
type. Based on the data collected from dengue patients, there were 18 symptoms (S1-S18) and values for 
thrombocyte (T) and hemoglobin (H). Hence, a total of 20 attributes were used as input data for the following 
process, namely classification. Symptom data is given a value of | if the patient experiences it. Otherwise, it 
is worth 0 if the patient does not experience it. The dengue diagnosis results are divided into three classes, 
namely: DF, DHF, and DSS. 

In the classification process, five methods were applied, consisting of C.45, DT, KNN, RF, and 
SVM. These were done to obtain the most optimal method performance, which was measured using three 
parameters, namely precision, recall, and accuracy. This value is obtained based on a multiclass confusion 
matrix using a cross-validation technique with three different k-fold values, namely 3, 5, and 10. A 
comparison of the performance of the five classification methods obtained with the barbed k-fold value is 
summarized in Table 3. 

Table 3 shows KNN with k-fold 3 yielded the lowest performance indicated by precision, recall, and 
accuracy values of 93.9%, 93.6%, and 93.6%, respectively. At the same time, the three parameters achieve a 
value of 94.8%, 94.5%, and 94.5% for k-fold 5 and 10. RF with k-fold 10 and SVM with k-fold 3 and 10 
respectively had the maximum performance of the classification method with precision, recall, and accuracy 
of 99.1%, 99.1%, and 99.1%, respectively. 
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As seen in Table 3, the k-fold value has an effect on the precision, recall, and accuracy values. 
K-fold 5 and 10 implemented in the C.45 classification method and the optimal performance KNN were 
created, respectively, with 97.3% and 94.5% accuracy values. Meanwhile, the DT with the best accuracy 
value, 97.3%, was generated with k-fold 5, while RF with k-fold 10 and SVM with k-fold 3 and 10 achieved 
the highest accuracy of 99.1%. Overall, k-fold 10 yields the best results for each classifier, with the exception 
of the DT, which yields the best results at k-fold 5. Table 3 represents the number of successfully classified 
and incorrectly classified data for each method. 


Table 3. Dengue classification results using various classifiers and k-fold 
Evaluation parameter (%) 


Classifier K-fold 


Precision Recall Accuracy 

C45 3 94.5 94.5 95.5 
5 97.5 97.3 97.3 

10 97.5 97.3 97.3 

DT 3 94.5 94.5 95.5 
5 97.5 97.3 97.3 

10 95.5 95.5 95.5 

KNN 3 93.9 93.6 93.6 
5 94.8 94.5 94.5 

10 94.7 94.5 94.5 

RF 3 96.4 96.4 96.4 
5 98.2 98.2 98.2 

10 99.1 99.1 99.1 

SVM 3 99.1 99.1 99.1 
5 98.2 98.2 98.2 

10 99.1 99.1 99.1 


The comprehensive study results of each classifier present in Figures 4-8. Figure 4(a) depicts the 
confusion matrix of the C.45 classifier for k-folds 3, while Figure 4(b) depicts the confusion matrix for 
k-folds 5 and 10. Figure 4(a) shows the implementation of k-fold 3 leads to misclassification of 3 data in the 
DF class, which is classified as DHF class, and 3 data in the DHF class classified as DF class. The 
classification results using a DT with k-fold 3, 5, and 10 are shown in Figures 5(a)-(c). Figures 5(a) and 5(c) 
show that errors also occur in the DF and DHF classes. While using k-fold 5 as shown in Figure 5(b), errors 
only occur in the DHF class that is classified as DF class as much as 3 data. Moreover, the results of the 
application of KNN with k-fold 3, 5, and 10 as shown in Figure 6. It indicates misclassification occurs in DF 
and DHF classes using k-fold 3 as presented in Figure 6(a), but errors occur in all classes using k-fold 5 and 
10 as shown in Figures 6(b) and 6(c). Furthermore, the application of RF with k-fold 3 shows that 
misclassification also occurs in all classes, as depicted in Figure 7(a). Meanwhile, misclassification occurs in 
DF and DHF classes with k-fold 5 as shown in Figure 7(b). In comparison, misclassification occurs only in | 
data in DHF class, classified as DF class with k-fold 10 as shown in Figure 7(c). Furthermore, Figure 8 
presents the classification results using SVM with k-fold 3, and 10 as shown in Figure 8(a) shows only | data 
in the DHF class misclassified as DF class. Meanwhile, in k-fold 5 as shown in Figure 8(b) there are 2 data, 
namely 1 data on classes DF and DHF are misclassified. Overall, Figures 4-8 show that misclassification 
often occurs in the DHF class, which is classified as the DF class. 
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Figure 4. Confusion matrix of C.45 classifier with (a) k-fold 3 and (b) k-fold 5 and 10 
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Figure 5. Confusion matrix of DT classifier with (a) k-fold 3, (b) k-fold 5, and (c) k-fold 10 
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Figure 6. Confusion matrix of KNN classifier with (a) k-fold 3, (b) k-fold 5, and (c) k-fold 10 
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Figure 7. Confusion matrix of RF classifier with (a) k-fold 3, (b) k-fold 5, and (c) k-fold 10 
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Figure 8. Confusion matrix of SVM classifier with (a) k-fold 3 and 10 and (b) k-fold 5 


4. CONCLUSION 
Dengue is a dangerous disease that can cause death with different symptoms experienced by 


patients. There are three varieties of dengue: DF, DHF, and DSS. This disease needs to be detected early to 
get the appropriate treatment. Classification of dengue types can be done using a machine learning approach. 
Five classifiers were used, consisting of C.45, DT, KNN, RF, and SVM. The input data for each classifier is 
20 attributes consisting of 18 symptoms experienced by patients and the values of platelets and hemoglobin. 
The evaluation was conducted using cross-validation with barbed k-fold values of 3, 5, and 10 for each 
classifier. The evaluation results show that the performance of SVM with k-fold 3 and 10 managed to achieve 
the highest accuracy value of 99.1%. Meanwhile, the RF also achieved an accuracy of 99.1% but only at 
k-fold 10, with errors only occurring in 1 data in the DHF class classified as DF class. It shows the k-fold 
value can affect the classification results. Although the accuracy obtained is lower, other classifiers, namely 
C.45, KNN, and DT, have achieved more than 90% accuracy. This method is expected to be used for datasets 
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with 


o ISSN: 2252-8938 


a larger amount of data for further study. Therefore, it can increase the value of accuracy and is better 


for predicting or classifying other diseases. 
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